A Comparative Approach to RNA Pseudoknotted Structure Prediction Based on Multiple Context-Free Grammar

نویسندگان

  • Hiroyuki Seki
  • Nobuyoshi Mizoguchi
  • Yuki Kato
چکیده

Multiple context-free grammar (mcfg) [10] is a natural extension of context free grammar (cfg) and inherits many good properties of cfg. For example, the class of languages generated by mcfg (called multiple context-free languages or mcfl) is a substitution closed full AFL and the membership problem for mcfl L is solvable in O(n) time where n is the length of an input string and e is a constant determined by an mcfg that generates L. Recently, formal language theory has been applied to structure prediction of biological sequences. For example, an RNA can be regarded as a string over four symbols (or bases) A,U,C, G, which is called a primary sequence. An RNA takes a folding structure called a secondary structure, which is made by base pairs such as A-U and C-G. There is close co-relation between the secondary structure of an RNA and its function, and so predicting the secondary structure of a given RNA primary sequence is an important problem. If the secondary structure consists of simple substructure called stem-loop only, then the structure can be modeled by a derivation tree of cfg. In this case, the prediction can be realized by a parsing algorithm for cfg. However, there is another important substructure called pseudoknot, which cannot be described by cfg. Hence, there have been a few studies on secondary structure prediction based on a grammar of which generative power is stronger than cfg [11, 7]. The authors have applied a parsing algorithm for mcfl to RNA secondary structure prediction [3, 4]. However, these methods need a fare amount of training data for parameter setting. Recently, we proposed a prediction method based on comparative sequence analysis, which does not require training data [6]. In the presentation, we would like to give a quick review of mcfg, followed by the experimental results on secondary structure prediction for eight families of real RNA sequences [6].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic modeling of RNA pseudoknotted structures: a grammatical approach

MOTIVATION Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large...

متن کامل

Pairwise RNA Pseudoknotted Structure Prediction Based on Stochastic Grammar

RNA secondary structure prediction is one of the major topics in bioinformatics. A prediction method based on a parsing algorithm for formal grammars is a promising approach. Also, it is expected that comparative sequence analysis achieves higher accuracy than the one using a single sequence since the former approach can use evolutionary information that homologous RNAs are likely to conserve a...

متن کامل

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar

Several grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple contextfree grammars (MCFGs), which are natural extension of context-free grammars and can represent pseudoknots, and extend a specific subclass of MCFGs to a probabilistic model called SMCFG. We present a polynomial time parsing algorithm for finding the most probable derivation tr...

متن کامل

Prediction of RNA Pseudoknotted Secondary Structure using Stochastic Context Free Grammars (SCFG)

Pseudoknots are a frequent RNA structure that assumes essential roles for varied biocatalyst cell’s functions. One of the most challenging fields in bioinformatics is the prediction of this secondary structure based on the base-pair sequence that dictates it. Previously, a model adapted from computational linguistics – Stochastic Context Free Grammars (SCFG) – has been used to predict RNA secon...

متن کامل

Stochastic Multiple Context-Free Grammar for RNA Pseudoknot Modeling

Several grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple context-free grammars (MCFGs), which are natural extension of context-free grammars and can represent pseudoknots, and extend a specific subclass of MCFGs to a probabilistic model called SMCFG. We present a polynomial time parsing algorithm for finding the most probable derivation t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011